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Abstract 


We say that two sequences x and w of length m are order-isomorphic (of the 
same “shape”) if wļi] < wl[y] if and only if [i] < a[j] for each i, 7 € [1,m]. We 
present a simple linear time algorithm for checking if a given sequence y of length 
n contains a factor which is order-isomorphic to a given pattern x. A factor is a 
subsequence of consecutive symbols of y, so we call our problem the consecutive 
permutation pattern matching. The (general) permutation pattern matching 
problem is related to general subsequences and is known to be NP-complete. 
We show that the situation for consecutive subsequences is significantly different 
and present an O(n + m) time algorithm under a natural assumption that the 
symbols of x can be sorted in O(m) time, otherwise the time is O(n + mlogm). 
In our algorithm we use a modification of the classical Knuth-Morris-Pratt string 
matching algorithm. 
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1. Introduction 


The problem of consecutive permutation pattern matching is a natural ex- 
tension of the classical permutation pattern matching and a special variant of 
the so-called generalized permutation patterns. Several combinatorial results 
for this problem were known, see e.g. Elizalde and Noy [9], Warlimont [17, 18]; 
see also chapter 5 in [12]. However, there was no previous study of algorithmics 
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of this problem. We present a linear time algorithm for consecutive permutation 
pattern matching. 

Patterns in permutations are actively studied mostly from the combinatorial 
point of view. This field of study is concentrated on pattern avoidance, that 
is, counting the number of permutations not containing a subsequence which is 
order-isomorphic to a given pattern. Knuth considered permutations avoiding 
the pattern 312 [13], Lovász considered permutations avoiding the pattern 213 
[14], and Rotem those that do not contain 231 nor 312 [15], just to mention a 
few most famous examples. 

There are several algorithmic results related to pattern matching in permu- 
tations. Bose et al. [4] showed this problem to be NP-complete. Denote by 
m and n the length of the pattern and the text. A general algorithm with 
O(n°-47m™+0(™)) time complexity was given in [1], and an O*(1.79") time al- 
gorithm was recently given in [6]. For several special cases polynomial time 
algorithms are known. In [4] an O(mn®) time and O(mn*) space algorithm for 
the case of a separable pattern is given. A permutation is separable if it avoids 
the patterns 2413 and 3142. Afterwards, Ibarra [11] improved this result to 
O(mn*) time and O(mn?) space. If both the text and the pattern avoid the 
permutation 321, an O(m?n°) time algorithm is known [10]. Note that the case 
of an increasing pattern can be reduced to searching for the longest increasing 
subsequence, which can be done in O(n log log n) time for permutations [7]. An- 
other simpler case, when the permutation pattern has length 4, was shown in 
[2] to be solvable in O(n log n) time. 

Generalized permutation patterns (also called vincular patterns, see [12]) 
were introduced by Babson and Steingrimsson [3] and have proved to have con- 
nections to a variety of other combinatorial structures, see the survey [16]. A 
generalized pattern is a sequence in which two adjacent symbols may or may 
not be separated by a dash. The absence of a dash between two adjacent sym- 
bols in a pattern imposes an additional requirement that the corresponding 
symbols in the text must be adjacent. Thus an ordinary permutation pattern 
P1 P2 P3 --. pk corresponds to a generalized pattern of the form p1-p2-p3-...-Dr- 
On the other hand, a dashless generalized permutation pattern represents a 
consecutive pattern, that must form a factor of the text (less common names: 
segmented pattern, segmental pattern, subword pattern, see [12]). Combinato- 
rial properties of consecutive permutation patterns were considered in [9, 17, 18]. 
No previous algorithmic results related to consecutive patterns were known (as 
for the generalized patterns, only a W/[1]-completeness result was given in [5]). 

We present a linear time algorithm for permutation pattern matching of 
consecutive patterns. Our algorithm is based on a simple, yet non-trivial, mod- 
ification of the Morris-Pratt pattern matching algorithm for strings. 


2. Order-isomorphism 


We consider sequences over an integer alphabet X, x € &*. The positions in 
x are numbered from 1 to |x|. Two sequences x, y of the same length are called 


order-isomorphic (or simply isomorphic), written x ~ y, if 
(V1<i,j <a) ali] < cj] & yli] < yl). 


For example, 414735234 = 8181069468. In this section we show a linear 
time algorithm for checking isomorphism of two sequences. 
For i =1,...,|a| define: 


LMax,/i]=j if «[j] = max{alk] : k © [1,7-1], z[k] < afi}, 


if there is no such j then LMax,|[i] = 0, similarly define: 


LMinsli =j if <x[j] = min{a[k] : k € [1,7-1], z[k] > zi}, 


and LMin,[i] = 0 if no such j exists. If several equally good values of j exist, 
an arbitrary one can be selected (we select the greatest good value of j). The 
[Max and LMin tables are called location tables, see Table 1. If the pattern is 
unambiguous then we omit the index in the notation. 


i |1 23456789 
si) 4 14735231 
LMax{i] |0 0 1 3 2 3 2 5 8 
LMinfi] |0 1 1 0 3 4 5 5 8 


Table 1: The location tables for the pattern x = 414735234. 


In Lemma 1 we show that location tables can be computed as fast as sorting 
all the symbols of the pattern. 


Lemma 1. Let x be a sequence of length m and let sort(x) be the time required 
to sort all the elements of x. Then location tables of x can be computed in 
O(sort(x)) time. 


PROOF. Let us sort positions of x with respect to their contents (the symbols 
they contain). In case of equal contents the smaller positions come first. Let 
S be the resulting sequence of positions. Then LMax|j] is the nearest smaller 
value to the left of S[i] = j (if there is no such value, LMax[j] = 0), see Table 2. 
The LMin table is computed similarly, by taking nearest smaller value to the 
right in a sequence S’ constructed exactly as the sequence S but with a reversed 
order of positions with equal contents. 


sfs] |1 2 3 3 4 4 4 5 7 
s[i] 275813964 
LMax| Si] |0 2 2 5 0 I 3 3 3 


Table 2: Computation of the LMax table for the pattern from Table 1, as in the proof of 
Lemma 1. 


It is folklore knowledge that the problem of computing nearest smaller values 
for all elements of a sequence, also known as the “all nearest smaller values” 
problem, can be solved in linear time by a stack-based algorithm. 


The following lemma provides a justification for introducing the location 
tables in the context of consecutive permutation pattern matching. 


Lemma 2. Assume that 
a{l..t] ~ y[1..t], t< |x|, |y| and a = LMax,[t + 1], b = LMin,[t + 1]. 


Then 
a[l..t+1xryll..t4+1] & yla] < ylt+ 1] < yib]. 


In case a or b is equal to 0, we omit the respective inequality in the condition. 


PROOF. (=) By the definition of the location tables, we have æ[a] < x[t + 
1] < x[b]. Now order-isomorphism of x[1..t + 1] and y[1..t + 1] implies that 
yla] < yit + 1] < yib]. 

(<=) We need to show that z[1..t+1] ~ y[l..t+ 1]. We have z[1..t] = 
y[1..t], hence it suffices to prove that, for i < t, 


zli] < zjt+1] & yli] < y[t+ 1. 


Assume that xfi] < a[t+1] for some i € {1,...,t}. By the definition of the LMax 
table, we have x[i] < x[a]; by the order-isomorphism of x[1..t] and y[1..t], we 
have y[i] < yla]; finally, by the assumption of the lemma, y[a] < y[t + 1], hence 
yli] < y[é+1]. In a similar way we show that 2/i] > x[t+1] implies yfi] > y[t+1], 
which yields the requested equivalence. 


Let us make a natural assumption that the symbols of x can be sorted in 
O(m) time, e.g. they are elements of the set {1,...,m?™}. Under this as- 
sumption, Lemma 2 (together with Lemma 1) implies an O(1) time incremental 
criterion for checking if a sequence is isomorphic to a prefix of the pattern. This 
is the basic tool used in the pattern matching algorithm presented in the next 
section: 


Lemma 3. Let x be a pattern of length m whose symbols can be sorted in O(m) 
time. After O(m) time preprocessing one can answer queries of the following 
form: “assuming that x[1..t] ~ y[1..t], check if x[1..t +1] ~ y[1..t+1]” for 
any sequence y in constant time. 


3. Consecutive permutation pattern matching 


Let x be a pattern of length m. The order-borders table P for x is defined 
as follows: 


P|1]=0, Plt] = max{j <i: afl..g] ~ xli — j+1..ilļ} for 7 > 2, 


see Table 3 as an example. 


Table 3: The order-borders table P for the pattern x = 25147368. 


The algorithm computing the order-borders table is similar to the algorithm 
computing (regular) borders in the Morris-Pratt algorithm. 


Algorithm Compute the table P 
P[0] :=—-1; t:=-—-1; 
for i := 1 to m do 
invariant: z[1..¢] ~ xli—t..i— 1] 
while ¢ > 0 and z[1..t+ 1] % x[i — t.. i] do t := Pit]; 
t:=t+1; Pfi] := t; 


The test x[1..t+1] ~ z|i—t..i] can be done in O(1) time due to Lemma 3 
and the invariant of the while-loop. The number of such tests is linear which 
follows from the complexity analysis of the Morris-Pratt algorithm (note that t 
decreases after each comparison). Consequently we obtain the following lemma. 


Lemma 4. The order-borders table can be computed in linear time. 
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Figure 1: An order-occurrence of the pattern 21453 in the text 563810719108. There is 
also a second order-occurrence of this pattern formed by the last 5 symbols of the text. 


A pattern x of length m order-occurs at position i of a text y if x ~ yfi + 
1..i +m], see also Fig. 1. Let n be the length of y. We can find all order- 
occurrences of x in y in linear time using the algorithm below (the pseudocode 
resembles the implementation of Morris-Pratt pattern matching algorithm as 
given in [8]). 


Algorithm Modified algorithm of Morris and Pratt 
i := 0; 7 := 0; 
while i < n — m do begin 
invariant: x|[1.. j] ~ yli +1..i+ j] 
while j < m and z|[1..j +1] ~ yļli+1..i+j+1] do 
j:=j+1; 
if j = m then write i; 
i:= i+ (j — Pll); j= max(0, PIJ)); 
end 


Theorem 5 summarizes the linear time algorithm for consecutive permuta- 
tion pattern matching. 


Theorem 5. All order-occurrences of a pattern in a given text can be computed 
in linear time. 


PROOF. By Lemma 4, the order-borders table for the pattern can be computed 
in linear time. Recall that this algorithm involves the computation of location 
tables, see Lemma 1. 

The procedure for finding order-occurrences mimics the Morris-Pratt pattern 
matching algorithm, but instead of testing equality of symbols of the pattern 
and the text we check order-isomorphism of a prefix of the pattern and a factor 
of the text. Due to the invariant in the pseudocode, each such test can be done 
in constant time using Lemma 3. The number of remaining operations in the 
pattern matching is linear just as in the original Morris-Pratt algorithm. 
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